library(tidyverse)
library(tidymodels)
library(learntidymodels)
library(tidytext)
library(data.table)
library(tidyr)
library(dplyr)
library(here)
library(quanteda)
library(tm)
library(stm)
library(lda)
library(ldatuning)
library(skimr)
library(SnowballC)
library(Matrix)
library(text2vec)
library(textstem)
library(parallel)
library(doMC)
library(parallelly)
library(doParallel)
library(kableExtra)
library(ggthemes)
library(furrr)
library(DT)
1. Introduction
ArXiv is a free distribution service and open-access archive for nearly 2.5 million scholarly articles in the fields of :(i) physics, (ii) mathematics, (iii) quantitative biology (iv), computer science, (v) quantitative finance, (vi) statistics, (vii) electrical engineering and systems science, and (viii) economics.
When publishing an article on ArXiv, an author must select the most applicable field and subject area. For example, an author can choose the field of computer science and the subject area artificial intelligence. Authors can also select multiple areas by cross-listing an article and choosing additional subject areas.
Much like data science, subject areas on ArXiv have expanded over the years, and definitions have evolved. Consequently, users now encounter challenges in finding articles of interest.
We load the necessary libraries:
2. Data overview
We start by loading and getting a glance at the data that we have:
# A tibble: 131,565 × 4
Date Title Abstract Subject_area
<chr> <chr> <chr> <chr>
1 26/12/2009 A User's Guide to Zot "Zot is… LO
2 05/10/2009 Prediction of Zoonosis Incidence in Human u… "Zoonos… LG
3 08/05/2015 Wireless Multicast for Zoomable Video Strea… "Zoomab… NI
4 09/12/2015 On Computing the Minkowski Difference of Zo… "Zonoto… CG
5 26/01/2007 The Zones Algorithm for Finding Points-Near… "Zones … DB
6 21/02/2017 Occupancy Counting with Burst and Intermitt… "Zone-l… NI
7 16/12/2009 Zone Diagrams in Euclidean Spaces and in Ot… "Zone d… CG
8 08/05/2015 Decomposition of Power Flow Used for Optimi… "Zonal … CE
9 23/12/2008 Some sufficient conditions on Hamiltonian d… "Z-mapp… DM
10 25/03/2013 ZKCM: a C++ library for multiprecision matr… "ZKCM i… MS
# ℹ 131,555 more rows
We notice that we have 5 variables and 131565 abstracts.
A quick summary tells us that we have roughly 894 NA observations under the variable Subject Area
variable, and that we have 131384 unique abstracts.
Name | Piped data |
Number of rows | 131565 |
Number of columns | 5 |
Key | NULL |
_______________________ | |
Column type frequency: | |
character | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
ID | 0 | 1.00 | 11 | 14 | 0 | 131565 | 0 |
Date | 0 | 1.00 | 10 | 10 | 0 | 6176 | 0 |
Title | 0 | 1.00 | 4 | 256 | 0 | 131346 | 0 |
Abstract | 1 | 1.00 | 9 | 3806 | 0 | 131384 | 0 |
Subject_area | 894 | 0.99 | 2 | 2 | 0 | 39 | 0 |
We will try something different today: both topic modeling and clustering. I know it might appear a little bit redundant as topic modeling kind of clusters text but bear with me🙏.
To achieve this, we will follow an approach that can be summarized in 2 steps:
Conversion of character data into numerical data as most algorithms understand numbers better than text;
Dimension reduction from a large corpus of words to a set of topics.
3. Topic modeling
3.1. Conversion of text data into numerical data : Corpus extraction
<- ArXiv %>%
arxiv_corpus na.omit() %>%
distinct(Abstract) %>%
unnest_tokens(word, Abstract, token = stringr::str_extract_all,
drop = FALSE ,
pattern = "\\b\\w[-\\w]*\\b") %>%
mutate(word = lemmatize_words(word)) %>%
anti_join(get_stopwords(source = "snowball")) %>%
filter(!word %in% "") %>%
filter(!str_detect(word, "\\d")) %>%
filter(nchar(word) >= 2) %>%
anti_join(get_stopwords())
<- arxiv_corpus %>%
vocabulary select(word) %>%
unique()
%>%
arxiv_corpus count(word, sort = TRUE)
Now we try to create a matrix from our arxiv_corpus
and try to filter out some words to alleviate the number of words. We tried to filter words appearing more than 100 times or more than 500 times and we finally sticked to more than 100 times only:
<- arxiv_corpus %>%
tidy_arxiv add_count(word) %>%
filter(n > 100) %>%
select(-n)
<- arxiv_corpus %>%
tidy_arxiv_500 add_count(word) %>%
filter(n > 500) %>%
select(-n)
<- tidy_arxiv %>%
arxiv_sparse count(Abstract, word) %>%
cast_sparse(Abstract, word, n)
<- tidy_arxiv_500 %>%
arxiv_sparse_500 count(Abstract, word) %>%
cast_sparse(Abstract, word, n)
<- arxiv_corpus %>%
arxiv_corpus_sparse count(Abstract, word) %>%
cast_sparse(Abstract, word, n)
The dimension of the sparse matrix with original corpus: 130491 rows and 166150 columns
The dimension of the sparse matrix with filter of words appearing at least 100 times in the corpus: 130491 rows and 6284 columns
The dimension of the sparse matrix with filter of words appearing at least 500 times in the corpus: 130488 rows and 2604 columns
3.2. Dimension reduction : train and evaluate topic models
Now we start with the training process. Because we cannot know ahead how many topics our dataset contains, we set different values and based on their performance, we select our number of interest.
With roughly 130491 documents, this process took over 16 hours to train😆, so should you want to reproduce, do not be in a hurry.
Furthermore, I’d like to give credit where it’s due to Julia Silge 🙏🏼 for her immense work and guidance.
set.seed(2024)
registerDoMC(cores = max(1, availableCores() - 1))
<- data_frame(K = c(20, 40, 50, 60, 70, 80, 100)) %>%
many_models mutate(topic_model = future_map(K, ~stm(arxiv_sparse, K = .,
verbose = FALSE)))
We now have topic models with different K values. We can assess their performance and decide which k value is the most appropriate for our data based on metrics such as semantic coherence, exclusivity, residual and others.
<- make.heldout(arxiv_sparse)
heldout
<- many_models %>%
k_result mutate(exclusivity = map(topic_model, exclusivity),
semantic_coherence = map(topic_model, semanticCoherence, arxiv_sparse),
eval_heldout = map(topic_model, eval.heldout, heldout$missing),
residual = map(topic_model, checkResiduals, arxiv_sparse),
bound = map_dbl(topic_model, function(x) max(x$convergence$bound)),
lfact = map_dbl(topic_model, function(x) lfactorial(x$settings$dim$K)),
lbound = bound + lfact,
iterations = map_dbl(topic_model, function(x) length(x$convergence$bound)))
Let us now have some diagnostic plots to allow us to choose the best k value:
The residuals appear to be the lowest at K=100, while the held-out likelihood is the highest at 100.
As for the semantic coherence, it is good practice to get a balance between the exclusivity and the semantic coherence.
Let’s take our min, mean,and max K values for this comparison:
Should we decide to have a threshold value of 9.8 for our exclusivity, it appears that K=100 is the most appropriate choice here.
However, because of our first plot, I may try to extend the values to 120 and 140 and see how they perform. Maybe in another analysis. But for now, let’s stick to k = 100.
*`A topic model with 100 topics, 130491 documents and a 6284 word dictionary.`*
Let us have a look at our reduced data. Because we reduced the number of columns (words) to get topics in the place, we will get a matrix where:
each row represents an abstract
each column represents a topic
value are the probability that a topic is found within a certain abstract.
# A tibble: 130,491 × 101
document `Topic 1` `Topic 2` `Topic 3` `Topic 4` `Topic 5` `Topic 6`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 "!-graphs provid… 0.00253 0.00227 0.00111 0.00738 0.0475 0.0332
2 "\" How well con… 0.00249 0.00331 0.00130 0.0145 0.00292 0.00646
3 "\" Yet another … 0.00241 0.00154 0.00114 0.0163 0.00265 0.00237
4 "\"Academia 2.0\… 0.00369 0.00147 0.00127 0.00377 0.00103 0.000324
5 "\"Amplify and F… 0.0205 0.00127 0.00130 0.0113 0.00328 0.000501
6 "\"Background su… 0.00361 0.00354 0.00185 0.0102 0.00224 0.00137
7 "\"Bibliometrics… 0.00270 0.00176 0.000860 0.00520 0.00111 0.00138
8 "\"Big Data is t… 0.00404 0.00187 0.00108 0.00305 0.00455 0.000274
9 "\"Big data\" ha… 0.00209 0.00159 0.00106 0.00477 0.000911 0.00246
10 "\"Citation clas… 0.00338 0.00197 0.000635 0.00313 0.00250 0.00176
# ℹ 130,481 more rows
# ℹ 94 more variables: `Topic 7` <dbl>, `Topic 8` <dbl>, `Topic 9` <dbl>,
# `Topic 10` <dbl>, `Topic 11` <dbl>, `Topic 12` <dbl>, `Topic 13` <dbl>,
# `Topic 14` <dbl>, `Topic 15` <dbl>, `Topic 16` <dbl>, `Topic 17` <dbl>,
# `Topic 18` <dbl>, `Topic 19` <dbl>, `Topic 20` <dbl>, `Topic 21` <dbl>,
# `Topic 22` <dbl>, `Topic 23` <dbl>, `Topic 24` <dbl>, `Topic 25` <dbl>,
# `Topic 26` <dbl>, `Topic 27` <dbl>, `Topic 28` <dbl>, `Topic 29` <dbl>, …
Now we have a standard tibble that we can easily manipulate as we wish.
Before that, let’s feed our eyes with some graphs to see the words that compose each topic:
We start by extracting the beta and gamma matrices
topic term beta
<int> <char> <num>
1: 1 abstract 5.468170e-12
2: 2 abstract 3.060442e-12
3: 3 abstract 6.786207e-13
4: 4 abstract 1.917353e-15
5: 5 abstract 9.713773e-17
---
628396: 96 swipt 1.311521e-57
628397: 97 swipt 3.176655e-50
628398: 98 swipt 7.467630e-64
628399: 99 swipt 5.360673e-67
628400: 100 swipt 2.414906e-47
<- tidy(topic_model_final,
tidy_gamma matrix = "gamma",
document_names = rownames(arxiv_sparse))
We extract the top terms from the beta matrix:
topic
<int>
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
10: 10
11: 11
12: 12
13: 13
14: 14
15: 15
16: 16
17: 17
18: 18
19: 19
20: 20
21: 21
22: 22
23: 23
24: 24
25: 25
26: 26
27: 27
28: 28
29: 29
30: 30
31: 31
32: 32
33: 33
34: 34
35: 35
36: 36
37: 37
38: 38
39: 39
40: 40
41: 41
42: 42
43: 43
44: 44
45: 45
46: 46
47: 47
48: 48
49: 49
50: 50
51: 51
52: 52
53: 53
54: 54
55: 55
56: 56
57: 57
58: 58
59: 59
60: 60
61: 61
62: 62
63: 63
64: 64
65: 65
66: 66
67: 67
68: 68
69: 69
70: 70
71: 71
72: 72
73: 73
74: 74
75: 75
76: 76
77: 77
78: 78
79: 79
80: 80
81: 81
82: 82
83: 83
84: 84
85: 85
86: 86
87: 87
88: 88
89: 89
90: 90
91: 91
92: 92
93: 93
94: 94
95: 95
96: 96
97: 97
98: 98
99: 99
100: 100
topic
terms
<char>
1: error, due, take, correct, crucial, modification, author, claim
2: path, long, short, note, frame, find, along, travel
3: robust, hold, uncertainty, robustness, uncertain, swarm, artificial, ca
4: problem, solution, solve, constraint, optimal, optimization, consider, find
5: condition, length, field, sufficient, family, necessary, generator, give
6: graph, direct, subgraph, show, parameterize, induce, bipartite, result
7: network, link, topology, layer, connect, communication, connectivity, wireless
8: et, al, de, recently, work, se, la, recent
9: set, measure, define, concept, order, represent, introduce, subset
10: system, hybrid, paper, operate, base, present, dynamical, provide
11: propose, parameter, filter, optimization, convergence, adaptive, stochastic, performance
12: code, decode, block, construction, decoder, encode, use, propose
13: edge, vertex, numb, every, emph, color, connect, cycle
14: information, context, side, mutual, available, source, can, additional
15: node, route, delay, traffic, packet, sensor, protocol, wireless
16: interference, channel, user, transmitter, receiver, primary, secondary, csi
17: human, computer, world, machine, understand, intelligence, artificial, ability
18: dataset, video, visual, question, temporal, answer, action, task
19: dynamic, change, behavior, individual, evolution, static, evolve, response
20: web, content, user, cache, video, page, site, stream
21: implementation, parallel, scale, implement, use, core, large, fast
22: face, recognition, localization, variation, local, descriptor, person, facial
23: impact, reference, science, scientific, study, paper, publish, bias
24: function, value, loss, continuous, hash, sensitivity, integral, show
25: strategy, game, equilibrium, player, nash, play, show, study
26: matrix, kernel, vector, subspace, space, projection, norm, spectral
27: analysis, use, tool, case, present, example, provide, describe
28: program, abstract, execution, functional, abstraction, language, use, support
29: policy, article, name, request, serve, copy, reward, push
30: signal, sense, sparse, measurement, reconstruction, recovery, recover, sparsity
31: network, neural, train, deep, convolutional, layer, architecture, cnn
32: student, team, volume, university, course, collaborative, collaboration, member
33: state, initial, configuration, transition, free, art, phase, net
34: bound, bind, low, upper, numb, tight, optimal, case
35: estimate, estimation, noise, mean, observation, gaussian, unknown, statistical
36: couple, spatial, cell, simulation, use, biological, material, brain
37: word, translation, task, sentence, representation, recurrent, embed, embeddings
38: structure, community, complex, structural, overlap, topological, underlie, identify
39: log, tree, omega, string, bit, size, use, give
40: property, standard, check, verification, verify, formal, specification, ensure
41: type, object, recursive, dependent, different, primitive, generic, session
42: scheme, message, share, propose, broadcast, send, secret, key
43: linear, vector, chain, markov, mix, integer, combination, quadratic
44: complexity, computational, reduction, show, hard, result, question, count
45: knowledge, detection, text, document, detect, entity, extract, base
46: relay, transmission, transmit, power, cooperative, outage, optimal, propose
47: compute, polynomial, transform, equation, discrete, decomposition, use, real
48: environment, robot, plan, vehicle, trajectory, use, robotic, reinforcement
49: logic, proof, theorem, calculus, prove, term, formula, logical
50: feature, classification, use, classifier, recognition, speech, performance, classify
51: process, framework, level, event, unify, base, propose, allow
52: distribution, probability, random, sample, entropy, density, expect, conditional
53: technology, organization, business, product, patient, customer, health, company
54: quality, evaluation, project, study, effort, practice, result, methodology
55: image, segmentation, use, propose, resolution, pixel, result, color
56: cluster, pattern, mine, hierarchical, discover, similarity, discovery, propose
57: approach, technique, use, procedure, instance, heuristic, new, novel
58: sequence, correlation, compression, number, shift, period, randomness, protein
59: service, resource, cloud, application, compute, management, infrastructure, provide
60: class, finite, automaton, regular, show, infinite, characterization, word
61: map, mathcal, leave, alpha, mathbb, frac, let, right
62: semantic, reason, relation, interpretation, express, description, trace, equivalence
63: device, mobile, location, use, sensor, monitor, application, can
64: match, rank, pair, list, stable, competitive, preference, assignment
65: language, natural, dependency, tag, use, grammar, parse, linguistic
66: control, task, schedule, controller, stability, real-time, switch, mode
67: datum, amount, large, big, collect, source, real, record
68: input, generate, component, output, produce, generation, stage, two
69: time, space, run, update, numb, stream, interval, can
70: learn, label, task, train, representation, domain, machine, can
71: user, security, privacy, key, secure, recommendation, use, trust
72: store, digital, storage, access, file, library, write, use
73: social, user, online, medium, study, much, people, twitter
74: model, capture, can, use, fit, parameter, result, predictive
75: region, partition, lattice, mu, lambda, nest, two, fine
76: power, energy, consumption, load, grid, circuit, propose, design
77: user, spectrum, access, base, station, cellular, propose, allocation
78: method, propose, use, base, prediction, accuracy, compare, result
79: mechanism, price, utility, market, auction, allocation, good, budget
80: channel, rate, capacity, source, gaussian, achievable, show, feedback
81: distribute, local, communication, global, failure, fault, consensus, reliability
82: point, dimension, line, shape, geometric, curve, plane, boundary
83: channel, frequency, antenna, performance, mimo, propose, signal, design
84: agent, action, assumption, may, knowledge, can, evidence, belief
85: weight, distance, metric, binary, sum, ensemble, generalize, average
86: computation, protocol, classical, quantum, communication, can, use, agreement
87: size, variable, prove, result, threshold, show, strong, exponential
88: object, track, map, scene, pose, detection, camera, motion
89: node, inference, probabilistic, bayesian, propagation, influence, spread, graphical
90: test, group, item, hypothesis, numb, case, bin, one
91: search, query, database, index, retrieval, engine, result, answer
92: decision, rule, make, choice, candidate, choose, aggregate, vote
93: research, discuss, paper, work, application, focus, issue, recent
94: approximation, epsilon, factor, constant, approximate, give, ratio, delta
95: algorithm, good, online, efficient, present, new, much, show
96: attack, security, can, detection, adversary, detect, malicious, attacker
97: performance, memory, cost, reduce, high, hardware, increase, can
98: design, software, development, engineer, architecture, module, source, build
99: theory, representation, notion, operation, domain, mathematical, construction, algebra
100: new, paper, good, use, present, base, can, introduce
terms
We join both the gamma and the top terms :
<- tidy_gamma %>%
gamma_terms group_by(topic) %>%
summarise(gamma = mean(gamma)) %>%
arrange(desc(gamma)) %>%
left_join(top_terms, by = "topic") %>%
mutate(topic = paste0("Topic ", topic),
topic = reorder(topic, gamma))
Then the plot:
The topic that occurs the most is mostly about research, discuss, paper, application, recent, etc.
Now in order to label each topic, we could use the first 3 words that construct it.
%>%
gamma_terms select(topic, gamma, terms) %>%
kable(digits = 3,
col.names = c("Topic", "Topic proportion", "Top 8 words")) %>%
kable_styling(full_width = F) %>%
row_spec(1, background = "white")
Topic | Topic proportion | Top 8 words |
---|---|---|
Topic 93 | 0.032 | research, discuss, paper, work, application, focus, issue, recent |
Topic 4 | 0.023 | problem, solution, solve, constraint, optimal, optimization, consider, find |
Topic 95 | 0.020 | algorithm, good, online, efficient, present, new, much, show |
Topic 78 | 0.020 | method, propose, use, base, prediction, accuracy, compare, result |
Topic 57 | 0.019 | approach, technique, use, procedure, instance, heuristic, new, novel |
Topic 70 | 0.018 | learn, label, task, train, representation, domain, machine, can |
Topic 27 | 0.017 | analysis, use, tool, case, present, example, provide, describe |
Topic 9 | 0.017 | set, measure, define, concept, order, represent, introduce, subset |
Topic 31 | 0.016 | network, neural, train, deep, convolutional, layer, architecture, cnn |
Topic 97 | 0.015 | performance, memory, cost, reduce, high, hardware, increase, can |
Topic 83 | 0.014 | channel, frequency, antenna, performance, mimo, propose, signal, design |
Topic 80 | 0.014 | channel, rate, capacity, source, gaussian, achievable, show, feedback |
Topic 59 | 0.014 | service, resource, cloud, application, compute, management, infrastructure, provide |
Topic 87 | 0.013 | size, variable, prove, result, threshold, show, strong, exponential |
Topic 54 | 0.013 | quality, evaluation, project, study, effort, practice, result, methodology |
Topic 99 | 0.013 | theory, representation, notion, operation, domain, mathematical, construction, algebra |
Topic 49 | 0.013 | logic, proof, theorem, calculus, prove, term, formula, logical |
Topic 50 | 0.013 | feature, classification, use, classifier, recognition, speech, performance, classify |
Topic 11 | 0.012 | propose, parameter, filter, optimization, convergence, adaptive, stochastic, performance |
Topic 74 | 0.012 | model, capture, can, use, fit, parameter, result, predictive |
Topic 60 | 0.012 | class, finite, automaton, regular, show, infinite, characterization, word |
Topic 73 | 0.012 | social, user, online, medium, study, much, people, twitter |
Topic 94 | 0.012 | approximation, epsilon, factor, constant, approximate, give, ratio, delta |
Topic 55 | 0.012 | image, segmentation, use, propose, resolution, pixel, result, color |
Topic 12 | 0.012 | code, decode, block, construction, decoder, encode, use, propose |
Topic 21 | 0.012 | implementation, parallel, scale, implement, use, core, large, fast |
Topic 34 | 0.012 | bound, bind, low, upper, numb, tight, optimal, case |
Topic 15 | 0.012 | node, route, delay, traffic, packet, sensor, protocol, wireless |
Topic 67 | 0.012 | datum, amount, large, big, collect, source, real, record |
Topic 82 | 0.011 | point, dimension, line, shape, geometric, curve, plane, boundary |
Topic 52 | 0.011 | distribution, probability, random, sample, entropy, density, expect, conditional |
Topic 88 | 0.011 | object, track, map, scene, pose, detection, camera, motion |
Topic 47 | 0.011 | compute, polynomial, transform, equation, discrete, decomposition, use, real |
Topic 44 | 0.011 | complexity, computational, reduction, show, hard, result, question, count |
Topic 77 | 0.011 | user, spectrum, access, base, station, cellular, propose, allocation |
Topic 28 | 0.011 | program, abstract, execution, functional, abstraction, language, use, support |
Topic 98 | 0.011 | design, software, development, engineer, architecture, module, source, build |
Topic 7 | 0.011 | network, link, topology, layer, connect, communication, connectivity, wireless |
Topic 69 | 0.011 | time, space, run, update, numb, stream, interval, can |
Topic 51 | 0.010 | process, framework, level, event, unify, base, propose, allow |
Topic 48 | 0.010 | environment, robot, plan, vehicle, trajectory, use, robotic, reinforcement |
Topic 13 | 0.010 | edge, vertex, numb, every, emph, color, connect, cycle |
Topic 53 | 0.010 | technology, organization, business, product, patient, customer, health, company |
Topic 10 | 0.010 | system, hybrid, paper, operate, base, present, dynamical, provide |
Topic 71 | 0.010 | user, security, privacy, key, secure, recommendation, use, trust |
Topic 35 | 0.010 | estimate, estimation, noise, mean, observation, gaussian, unknown, statistical |
Topic 45 | 0.010 | knowledge, detection, text, document, detect, entity, extract, base |
Topic 46 | 0.010 | relay, transmission, transmit, power, cooperative, outage, optimal, propose |
Topic 23 | 0.010 | impact, reference, science, scientific, study, paper, publish, bias |
Topic 84 | 0.010 | agent, action, assumption, may, knowledge, can, evidence, belief |
Topic 17 | 0.010 | human, computer, world, machine, understand, intelligence, artificial, ability |
Topic 19 | 0.010 | dynamic, change, behavior, individual, evolution, static, evolve, response |
Topic 63 | 0.010 | device, mobile, location, use, sensor, monitor, application, can |
Topic 6 | 0.010 | graph, direct, subgraph, show, parameterize, induce, bipartite, result |
Topic 62 | 0.010 | semantic, reason, relation, interpretation, express, description, trace, equivalence |
Topic 18 | 0.009 | dataset, video, visual, question, temporal, answer, action, task |
Topic 37 | 0.009 | word, translation, task, sentence, representation, recurrent, embed, embeddings |
Topic 30 | 0.009 | signal, sense, sparse, measurement, reconstruction, recovery, recover, sparsity |
Topic 76 | 0.009 | power, energy, consumption, load, grid, circuit, propose, design |
Topic 26 | 0.009 | matrix, kernel, vector, subspace, space, projection, norm, spectral |
Topic 16 | 0.009 | interference, channel, user, transmitter, receiver, primary, secondary, csi |
Topic 91 | 0.008 | search, query, database, index, retrieval, engine, result, answer |
Topic 40 | 0.008 | property, standard, check, verification, verify, formal, specification, ensure |
Topic 68 | 0.008 | input, generate, component, output, produce, generation, stage, two |
Topic 66 | 0.008 | control, task, schedule, controller, stability, real-time, switch, mode |
Topic 79 | 0.008 | mechanism, price, utility, market, auction, allocation, good, budget |
Topic 24 | 0.008 | function, value, loss, continuous, hash, sensitivity, integral, show |
Topic 81 | 0.008 | distribute, local, communication, global, failure, fault, consensus, reliability |
Topic 65 | 0.008 | language, natural, dependency, tag, use, grammar, parse, linguistic |
Topic 25 | 0.008 | strategy, game, equilibrium, player, nash, play, show, study |
Topic 92 | 0.008 | decision, rule, make, choice, candidate, choose, aggregate, vote |
Topic 36 | 0.007 | couple, spatial, cell, simulation, use, biological, material, brain |
Topic 42 | 0.007 | scheme, message, share, propose, broadcast, send, secret, key |
Topic 56 | 0.007 | cluster, pattern, mine, hierarchical, discover, similarity, discovery, propose |
Topic 96 | 0.007 | attack, security, can, detection, adversary, detect, malicious, attacker |
Topic 89 | 0.007 | node, inference, probabilistic, bayesian, propagation, influence, spread, graphical |
Topic 14 | 0.007 | information, context, side, mutual, available, source, can, additional |
Topic 39 | 0.007 | log, tree, omega, string, bit, size, use, give |
Topic 61 | 0.007 | map, mathcal, leave, alpha, mathbb, frac, let, right |
Topic 5 | 0.007 | condition, length, field, sufficient, family, necessary, generator, give |
Topic 32 | 0.006 | student, team, volume, university, course, collaborative, collaboration, member |
Topic 72 | 0.006 | store, digital, storage, access, file, library, write, use |
Topic 20 | 0.006 | web, content, user, cache, video, page, site, stream |
Topic 38 | 0.006 | structure, community, complex, structural, overlap, topological, underlie, identify |
Topic 85 | 0.006 | weight, distance, metric, binary, sum, ensemble, generalize, average |
Topic 22 | 0.006 | face, recognition, localization, variation, local, descriptor, person, facial |
Topic 43 | 0.006 | linear, vector, chain, markov, mix, integer, combination, quadratic |
Topic 33 | 0.006 | state, initial, configuration, transition, free, art, phase, net |
Topic 64 | 0.006 | match, rank, pair, list, stable, competitive, preference, assignment |
Topic 86 | 0.005 | computation, protocol, classical, quantum, communication, can, use, agreement |
Topic 90 | 0.005 | test, group, item, hypothesis, numb, case, bin, one |
Topic 58 | 0.005 | sequence, correlation, compression, number, shift, period, randomness, protein |
Topic 41 | 0.005 | type, object, recursive, dependent, different, primitive, generic, session |
Topic 1 | 0.005 | error, due, take, correct, crucial, modification, author, claim |
Topic 2 | 0.004 | path, long, short, note, frame, find, along, travel |
Topic 29 | 0.003 | policy, article, name, request, serve, copy, reward, push |
Topic 75 | 0.003 | region, partition, lattice, mu, lambda, nest, two, fine |
Topic 100 | 0.003 | new, paper, good, use, present, base, can, introduce |
Topic 8 | 0.003 | et, al, de, recently, work, se, la, recent |
Topic 3 | 0.003 | robust, hold, uncertainty, robustness, uncertain, swarm, artificial, ca |
4. Conclusion
Throughout this project, we have dealt with text from abstracts published by Arxiv. We transformed the text data, and reduced the dimension using topic modeling and LDA.
Next time, we will attempt to cluster our topics and assess whether certain topics have more relationship with one another than others.
Until then, drink water and stay safe🖖🏼.